Perplexity releases an AI assistant app for Mac, which can directly access the file system and native applications, supports text and voice interaction, can be activated by shortcut keys, perceives the user's current window, and proactively provides operation suggestions, promoting the deep integration of AI interaction into the operating system.
The release of the LPM1.0 model enables real-time generation of videos showing a person speaking, listening, and singing based on a single reference image. Its core breakthrough lies in multimodal processing, which can synchronously integrate text, audio, and images to generate dynamic scenes with accurate lip synchronization, subtle expressions, and natural emotional transitions. The model supports integration with mainstream speech AI systems such as ChatGPT, upgrading traditional voice conversations into real-time interactive experiences with visual feedback.
Meitu's AI Agent RoboNeo integrates Seedance2.0, upgrading from single-point generation to systematic workflows. It enables text-to-video, image-to-video, frame control, and reference-based generation, supporting flexible combinations to boost AI short video creation efficiency.....
The MIIT has publicly solicited opinions on 121 industry standard plans, focusing on regulating the application security of artificial intelligence model context protocols. The goal is to address protocol compatibility and data security issues in large models related to multimodal interaction, long text processing, and cross-platform calling through standardization, marking a significant step forward in China's AI underlying protocol standardization and security regulation system construction.
An AI detection bypass tool that converts AI-generated text into human-like content, successfully bypassing major AI detection systems.
An open-source text-to-speech system dedicated to achieving natural human speech.
An industrial-grade, controllable, and efficient zero-shot text-to-speech system
A text-to-image generation system based on cascaded diffusion
Google
$0.49
Input tokens/M
$2.1
Output tokens/M
1k
Context Length
Openai
$2.8
$11.2
Xai
$1.4
$3.5
2k
$7.7
$30.8
200
-
Anthropic
$105
$525
$0.7
$7
$35
$17.5
$21
Alibaba
$4
$16
Baidu
128
$6
$24
256
$1
$10
Bytedance
$1.2
$3.6
4
$2
redis
This is a cross-encoder model fine-tuned on the LangCache sentence pair dataset using the sentence-transformers library, based on the Alibaba-NLP/gte-reranker-modernbert-base model. It is specifically designed to calculate the semantic similarity score between text pairs, aiming to provide efficient text matching and reordering capabilities for the LangCache semantic cache system.
openbmb
VoxCPM is an innovative tokenizer-free end-to-end text-to-speech (TTS) system that overcomes the limitations of discrete tokenization by modeling speech in a continuous space. It has two core capabilities: context-aware speech generation and realistic zero-shot voice cloning. It can automatically adjust the prosody and style according to the text content and clone the speaker's timbre, accent, and emotion with just a short reference audio.
This is a semantic reordering model based on Cross Encoder, specifically fine-tuned for the Redis LangCache semantic caching system. This model can effectively calculate the similarity score of text pairs and is suitable for sentence pair classification and semantic similarity calculation tasks.
This is a dual-encoder sentence embedding model released by Redis and optimized for semantic caching tasks. It is fine-tuned based on sentence-transformers/all-MiniLM-L6-v2 and can map text to a 384-dimensional vector space, specifically designed to improve the query matching accuracy of the LangCache semantic caching system.
nvidia
NVIDIA GPT-OSS-120B Eagle3 is an optimized version based on the OpenAI gpt-oss-120b model. It adopts the Mixture of Experts (MoE) architecture, with a total of 120 billion parameters and 5 billion active parameters. This model supports both commercial and non-commercial use and is suitable for text generation tasks, especially for the development of AI Agent systems, chatbots, and other applications.
Lambent
Mira is a text generation model based on the fusion of multiple Gemma 3 27B base models. Through carefully selected training data and specific training methods, it has a unique ability to generate poetic texts. This model performs excellently in role-playing and creative writing, and can generate texts with literary charm according to different system prompts.
GenMedLabs
XTTS v2 GGUF is a memory-efficient text-to-speech system optimized for mobile devices. It uses a C++ inference engine to achieve ultra-low memory usage and fast loading.
gguf-org
vibevoice-gguf is a text-to-speech system based on the Microsoft VibeVoice-1.5B model. It runs through the gguf-connector and can convert text into natural speech. It supports voice cloning and multi-speaker voice generation.
ImrozeAslam
Hunyuan3D 2.0 is an advanced large-scale 3D synthesis system for generating high-resolution textured 3D assets.
unsloth
Llasa is a text-to-speech (TTS) system based on LLaMA, which extends the capabilities of the language model by integrating speech tokens, supporting Chinese and English speech generation.
Spark-TTS is an efficient text-to-speech system based on large language models (LLM), supporting bilingual synthesis in Chinese and English with zero-shot voice cloning.
prince-canuma
Spark-TTS is an advanced text-to-speech system based on large language models, capable of high-precision and natural-sounding speech synthesis.
AvaLovelace
LegoGPT is the first AI system that generates physically stable LEGO brick models from text prompts, fine-tuned based on Llama-3.2-1B-Instruct.
miscovery
A multilingual transformer model based on the encoder-decoder architecture, supporting tasks such as text summarization, translation, and question-answering systems.
mirth
Chonky is a Transformer model capable of intelligently splitting text into meaningful semantic chunks, suitable for RAG systems.
thinhkosay
Spark-TTS is an advanced text-to-speech system that leverages the powerful capabilities of large language models (LLMs) to achieve highly accurate and naturally fluent speech synthesis.
Chonky is a Transformer model that intelligently splits text into meaningful semantic chunks for RAG systems.
Chonky is a Transformer model that intelligently segments text into meaningful semantic chunks, suitable for RAG systems.
DragonLineageAI
Spark-TTS is an advanced text-to-speech system that leverages the powerful capabilities of large language models (LLMs) to achieve high-precision and natural-sounding speech synthesis.
Compumacy
An advanced large-scale 3D synthesis system developed by Tencent for generating high-resolution textured 3D assets
A text editing system integrated with Claude Desktop, which realizes text selection, editing, and automatic replacement functions at the macOS system level through the MCP protocol, supporting custom prompts and desktop notifications.
rag - mcp is an over - designed retrieval - augmented generation system that provides multiple text search modes (semantic search, question - answer search, style search) through a Python server. It uses PostgreSQL and pgvector to store text embedding vectors, supports interaction with AI agents, and has a complex but scalable architecture.
A document retrieval system based on MongoDB Atlas vector search and Voyage AI embedding technology, supporting semantic search and text matching, including document chunking, embedding generation, and storage functions.
A text-to-speech MCP server based on the Rime API, providing system audio playback functionality.
native-devtools-mcp is a cross-platform MCP server that provides AI agents with the ability to automate control of macOS, Windows, and Android systems, including screenshot, OCR text recognition, simulated click input, window management, and Android device control.
The TSAP MCP Server is a text search and analysis processing system based on the Model Context Protocol (MCP), providing standardized interface services for code intelligence and text analysis. The project consists of three major components: core TSAP functions, tool APIs, and the MCP adaptation layer. It supports various functions such as text search, code analysis, and data processing, and can be seamlessly integrated with MCP clients such as Claude Desktop.
An enterprise - level AI assistant system based on the Model Context Protocol, with intelligent server selection, text analysis, code review, sentiment analysis, and knowledge management functions, providing an aesthetically pleasing Web interface.
A high-performance MCP server implemented in Go language, providing AI assistant capabilities and system tool integration, supporting secure command execution, file operations, and text editing functions.
File Search MCP is a dedicated MCP server built on Rust, providing full-text search functionality for text files in the file system. It uses the Tantivy search engine for efficient indexing and retrieval.
An MCP server based on TypeScript that implements a Retrieval Augmented Generation (RAG) system for local documents, supporting querying and indexing of Git repositories and text files.
A file system operation server based on the MCP protocol, providing functions such as directory management, file reading and writing, text analysis, duplicate file search, and compression and decompression.
This project is a community - maintained collection of MCP servers, providing various functional services such as text search, HTTP requests, and system operations, which can be installed and managed through the CLI tool.
Rime MCP is a text-to-speech service based on the Rime API, which realizes voice synthesis and playback functions through the system's native audio player.
An MCP server implementation integrating the 4o-image API, supporting image generation and editing by LLMs and AI systems through a standardized protocol, including functions such as text-to-image generation and image editing.
A fully functional MCP server that offers 73 tools covering 11 modules including file system, diagnostics, scripts, time management, network, context, Git operations, user input, version control, clipboard, and text conversion.
This project is a community-maintained collection of MCP servers, providing various functional services such as text search, HTTP requests, and system operations, which can be easily installed and used via CLI tools.
Memento is a knowledge graph memory system based on SQLite, providing persistent memory functions, supporting full - text retrieval and semantic search, and realizing intelligent context retrieval through BGE - M3 embedding. It is suitable for technical and creative project management.
This project is an MCP server used to integrate Google's Gemini model with Claude Code to achieve collaboration between the two AI systems. It provides functions such as direct query, collaborative brainstorming, code analysis, text analysis, content summarization, and image prompt generation.
An MCP server that provides comprehensive audio playback functionality for macOS, supporting system sounds, text-to-speech, and custom audio file playback, suitable for MCP clients such as AI assistants.
Dungeon MCP Server is a text-based dungeon adventure game server based on the MCP protocol. It provides functions such as dungeon exploration, NPC interaction, combat system, and player data management, supporting RESTful API and custom configuration.